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Algebraic tools in statistics have recently been receiving special attention and a number of 
interactions between algebraic geometry and computational statistics have been rapidly devel- 

f>^ ' oping. This paper presents another such connection, namely, one between probabilistic models 

invariant under a finite group of (non-singular) linear transformations and polynomials invariant 
under the same group. Two specific aspects of the connection are discussed: generalization of 

i-i^ ' the (uniqueness part of the multivariate) problem of moments and log-linear, or toric, modeling 

j^ , by expansion of invariant terms. A distribution of minuscule subimages extracted from a large 

database of natural images is analyzed to illustrate the above concepts. 
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t^ ! 1. Introduction 
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Suppose frequency data n^ are indexed by 2 x 2 matrices w G il = -A^2x2([i]) with integer 
levels 1,2,...,L. Suppose, further, that the frequencies observed within certain subsets 
(^ . O of 17 appear to be very similar: n^ « n^i for all w, w' S O. This suggests data reduction 

by lifting the analysis to the quotient space 5n = {O} of the equivalence classes O. In 
particular, if the original and derived models are parameterized by the point p^^ and 
class po probabilities, respectively, then their maximum likelihood estimates under the 
above equaUty hypothesis are related as p^ = X^^eo'^"/^!^! =Po/\0\, where \0\ and 
TV stand for the cardinality of the set O and the total count X^^Gfj'^'^' respectively. 

This work has been motivated by a wish to better understand implications of such 
equality constraints on common probabilistic and statistical models when the constraints 
^ . are linked to structure of Q. Certainly, the above equality hypothesis can be formulated 

with an arbitrary indexing set fi, except that finiteness of classes O can then no longer be 
taken for granted. However, we will focus on the case where fi C K™ and, in particular, 
where the level sets of each of the m factors are ordered. One motivation is that, in 
practice, the levels result from an aggregation which, even when implicit, can still be 
important for inference. For example, multiple data sets of the above kind may be related 
to each other as each of them corresponds to a different discretization, or quantization, of 
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the same phenomenon. In particular, as coarser f2's are refined, corresponding statistical 
models should admit appropriate extensions [36] . This is also the context of our real data 
example in Section 5. 

We thus consider the following context. A family of related modeling frameworks, each 
with its own fl C M™ , all include certain equality hypotheses which are of the same origin. 
Hereafter, we refer collectively to all of these hypotheses as "G-invariance" for reasons 
to be made clear shortly. Our main question is: What tools are suitable for representing 
and operating with simultaneously all G-invariant members of these frameworks? First, 
we need to explain that by "the same origin" , we mean a finite subgroup G of the group 
of invertible linear transformations GL{m,M.) of R™. In effect, G is then necessarily 
(isomorphic to) a finite subgroup of the orthogonal group 0{m,R) < GL{m,M.). We also 
need to assume that the ^i's in all of the allowed frameworks (possibly including 51 = 
R™) are fixed or invariant under (the transformations in) G. If we identify fl with a 
geometric figure, then G is a subgroup of the full symmetry group of Q. (Note that any 
finite subgroup of E{in) ^ 0(m,R) x M."\ the group of isometrics of R™, is necessarily 
(isomorphic to) a subgroup of 0(m,R).) 

Our primary objects of interest are G-invariant probability distributions V^ on fl. 
(Being invariant under a group of transformations simply means that a measure assigns 
the same mass to the set B and to all of the transformed images gB of B for all g G G.) A 
trivial example is m = 1 and G = (—1), in which case 51 must be invariant under multipli- 
cation by — 1 and the density of any G-invariant continuous distribution must depend on 
X via x^. Note that if we also allow G to act on polynomials <j(a;) G R[a;] via q{x) t-^ q{—x), 
then the G-invariant polynomials q{x) = q{—x) must necessarily be polynomials in x^. 
It is then said that x^ generates the ring R[a;]*^ of G-invariant polynomials. 

The theory of polynomial invariants of finite groups [6, 11, 44] provides the follow- 
ing basis for a positive answer to our main question: M.[xi,X2, ■ ■ ■ ,Xm]^ always has a 
finite set of generators /i, /2, • • ■ , /at- Subsequently, for G-invariant measures, mixed G- 
invariant moments /" /2 ' ' ' fw P^W the role of the ordinary mixed moments 
a;" X2 ■■■Xm"^ of an arbitrary measure. Moreover, since all functions on O finite 
are essentially polynomials, all G-invariant functions on fl are essentially G-invariant 
polynomials. 

Finding such generators (and possible algebraic relations among them) is not a trivial 
task, in general. Fortunately, efficient algorithms for such computations have recently 
been developed (see, e.g., [11, 44, 46]). Moreover, there are also widely available computer 
algebra tools such as Gap [47], INVAR [29], Macaulay2 [20], Magma [4], to name but a 
few, implementing those algorithms. 

The remainder of this paper is organized as follows. In Section 2, we briefly introduce 
our main algebraic ingredients that might not be very familiar to the general audience. 
Next, we discuss two implications of G-invariance: first, in Section 3, wc show how the 
uniqueness part of the multivariate moment problem [9] generalizes in the presence of 
G-invariance and second, in order to illustrate practical relevance of the above observa- 
tions, in Section 4, we incorporate G-invariance into a concrete modeling approach based 
on sequential polynomial expansions of log-densities [1]. Specifically, wc first outline in 
Sections 4.2-4.3 a lookahead version of log-linear model construction in the presence of 
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G-invariance. We then apply this routine to real data in Section 5, with 51 = ^^^2 x 2 ([-£']) , 
as in the beginning of this section. (Although such automatic routines should be accom- 
panied by model selection considerations, we do not discuss this here.) Apparently, SI can 
be identified with the square-base cuboid and is hence invariant under the 16 transfor- 
mations of the full symmetry group G. This example originates from [30], where natural 
microimage frequency data were observed to be nearly G- invariant, largely independent 
of experimental (image preprocessing) conditions and sampling schemes. Here, we "let 
the data speak" by providing our greedy lookahcad model constructor (Section 4.3) at 
each step with large sets of ordinary and G-invariant terms for possible inclusion in 
the expanded model. G-invariant terms are immediately selected at the first steps for 
delivering best fit. 

We conclude in Section 6 by commenting on computational issues related to modeling 
in the presence of G-invariance. Finally, an extensive account of symmetries in probability, 
statistics and physics with many examples and exercises appears in [49] . This work could 
perhaps complement [49] by bringing in polynomial invariants of finite groups (Section 2), 
the connection to the problem of moments (Section 3), a certain information-theoretic 
flavor (Section 4) and a significant example from the natural image statistics (Section 5). 

2. G-invariance and its polynomial generators 

Let a group G act on a set A, and write ga for the image of a € ^ and g G G under this 
action. (For an introduction to the concept of group action, see [13].) 

Definition 1. B C A is fixed under G, or G-invariant, if for all h G B and all g G G, 
gbeB. 

Any G action on A extends to a G action on M'^, the set of all real- valued functions 
on A: {gf){a) — f{g~^a), where g & G and / G M.-^ and a € A. Let us be more concrete 
and have a finite group G act on W — M™ in a way that admits a linear (matrix) 
representation p:G ^^ GL{W) (= GL{m,M.)). We will simply identify the original action 
of G on VF with its matrix representation, p, and will therefore think of <? G G as an 
m X m matrix. 

Proposition 1. The following group actions are well defined: 

(1) the (restricted) action of G on a G-invariant fl C W ; 

(2) the G action on B, the Borel a-algebra on il,gB ~ {gcu'.Lu G -B}; 

(3) the G action on Ai, the set of (positive) measures on B, 

{gP){B) = P{g-^B), B e B, P e M; (2.1) 

(4) the G action on R[a;i,X2, . . . ,Xm], the set of real polynomials in m indeterminates, 

{9f)iv)=f{9^^'v), where geG, f eR[xuX2,...,x„i] and veW. (2.2) 
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The equivalence classes O & Sq of G action on ft and their set Sq = ft/G are referred 
to as orbits and the orbit space, respectively. 

For convenience, we will be writing 'Eph{X) for Jj^ h{x) dP{x) for any P E M (and any 
measurable h: VF ^ R), as \i X = (Xi,X2, . . . ,X„i) were a random vector distributed 
according to P. 

The multiindex notation /" for / e R^ and a = (a(l),a(2), . . .,a{N)) e N^ means 
^a(i) _ ^a(JV) ^^^^ .^ particular, X" = xf^^ ■ ■ ■ X/"^ Here, N = {0, 1,2,...}. 

We will need the following sets of G-invariant measures on B. 

Definition 2. M'^ ^ {P e M:gP = P Vg e G} and Mf = M^ Ci M* , where M* = 

{PeM:¥.p\X"\ <oo VaeN'"}. 

Other useful invariant objects include the following: 7"^, the set of invariant probability 
measures (pm) on il; (M^^)*^, the set of invariant real functions on 57; B'~^, the a-algebra 
of invariant Borel sets; M[a;i, a;2, . . . , Xm]^ , the ring, and algebra, of invariant polynomials 
on W. The following operator projects Mp, the linear space of real functions on fi, onto 
(M^^)*^, the linear subspace of G-invariant real functions on f^, and plays a key role in 
the ensuing development: 

^(/) = Tr^E-9/- (2-3) 

We will also be using the restricted operator TZ:M.[xi,X2, ■ ■ ■ ,x„i] — > M[xi,X2, . . . ,Xm] 
and the adjoint K* -.M^ M^ , given by 

^*(^) = ^E-9^- (2.4) 

' ' gee 

The following statements follow from the fact that for all g G G, det{g) = ±1. 

Proposition 2. Let P E A4 have a density p relative to some reference measure ^. TZ{p) 
is then a density ofTV{P) relative to fi. Also, if p is a density of a G-invariant measure 
P relative to fi, then p is fi-a.e. G-invariant. 

In polynomial algebra, the averaging map (2.3) is called the Reynolds Operator [6, 44]- 
The orbit- averaging feature of this operator is apparent from its definition and the fol- 
lowing property further underlines the connection with probabilistic averaging: for all 
V/ G M. and all h € (M ) , TZ{hf) = hTZ{f), that is, a random variable measurable rel- 
ative to the a-algebra on which conditioning is performed almost surely commutes with 
the conditional expectation. 

Our main ingredients are invariant polynomials M\xi,X2, . ■ . ,Xm] and their special 
representatives that generate the entire ring [6, 11, 44]. 

Definition 3. Polynomials /i,...,/7v from M\xi,X2, ■ . ■ ,Xm\ are said to generate 
M[.Ti,a:;2, . . . ,Xm\'~^ if any f G R[a;i,a;2, . . . ,Xjn]'~^ can be expressed as a polynomial in terms 
of fi, . . ., fN- Such fi,. . . , fpf are referred to as generators. 
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Definition 4- We call a system of generators fi,...,fN of M.[xi,X2,- ■ -jXm]^ minimal 
if no proper subset of fi,. . . , f^ generate M.[xi,X2,- ■ ■ ,Xm]'~^ ■ fi,---,fN comprising a 
minimal system are referred to as fundamental integral invariants. 

That there always exists a finite system of such generators was proven by Hilbert for 
polynomials with coefficients from fields of characteristic zero (e.g., K) and later extended 
for certain fields of positive characteristic by Nocther [6, 11, 17, 44, 46]. (Note that two 
minimal sets need not, in general, have the same number of generators.) 

Remark 1. Let C[xi,X2, ■ ■ ■,Xm]'~^ be the ring (also a complex algebra) of G- 
invariant polynomials with complex coefficients. Then, note that for any r(x) £ 
C[xi,X2,- ■ ■,Xm]'^ , Ro{r{x)), Im(r(a;)) €^[xi,X2, ■ ■ ■,Xm]'^ since the complex conjuga- 
tion on C[a;i,a::2, . . . ,x,„] commutes with the G action on C[a:i,a;2, . . . ,a;,„]. 

The next fact is fundamental for our discussion and is a variation of a well-known 
result in invariant theory [6, 38, 44, 46]. We give a short, basic proof of this result after 
discussing Example 1 below. 

Proposition 3. Let /i, . . . ,/jv generate R[xi,X2,- ■ ■ ,Xm]^ and let f = (/i, . . . , /^r) : Vt^ — > 
M.^ . The map f : Sw — * IR^ mapping [w], the equivalence class of w ^ W, to f{w), is 
then well defined and infective. Thus, Sw — f{W), the image of f in M.^ . 

Example 1 . Let G = Z™ be the group of order 2™ generated by the componentwise sign 
inversions. As a matrix group, G is generated by m diagonal matrices (a*l), \ <k <m, 
with a^^ = (—1)''''' , where 5ik is the Kronecker delta. It can be shown that {fi=xj, i = 
1,...,to} is a minimal set of generators oi M\xi,X2, ■ . ■ ,XmY' . Thus, fiyV) =K>o- ^'^Y 
equivalence class [w] , w S W , is the smallest set containing w and symmetric with respect 
to the reflections about all of the hyperplancs Xi = 0, j = 1, . . . , m. The size of \w\ is 2', 
where I is the number of non-zero components of w, which also stays invariant under the 
transformations in G. In particular, if jn = 1, this is simply the symmetry around 0. 

The above example is special as N here is as low as tti, the lower bound on N . This 
example is also special since, in general, /i, /2, . . . , /at satisfy a non-trivial system of poly- 
nomial relations ft.(/i, /2, . . . , /w) = 0. This is the case, for instance, in Theorem 6 of our 
main example in Section 5. Such polynomials h form an ideal // = {ft. G M[j/i , 2/2 , . . . , Vn] , 
h{fi, /2, . . . , /at) = 0} to which we return in the conclusion (Section 6). If we were dealing 
with an algebraically closed field in place of R, then f{W) would be exactly V{If), the set 
of all the zeros of polynomials h G If {V{If) is the affine variety of If [6]). In particular, 
we would have Sw — Vilf)- The image of our real mapping / is only a semi- algebraic 
set [12] sitting inside V{If) and may or may not be the whole of V(If). In Example 1 
above, // = {0} is trivial and f{W) C V{If) = M^. Replacing M by the complex numbers 
would produce /(C™) = C^ = 1/(7/). 

Proof of Proposition 3. The G-invariance of /i,...,/jv means constancy of / on 
the orbits of Sw- Thus, [w] 1-^ f{w) is indeed well defined as a map from Sw onto 
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f{W). Therefore, we need only prove that, given any two distinct orbits Oi,02 & Sw, 
f{Oi) ^ f{02)- We show this by exhibiting a G-invariant polynomial h that takes distinct 
values on Oi and O2 , then concluding that the values assumed by at least one of the N 
generators on these orbits must be distinct since h can be expressed (as a polynomial) 
in terms of the given generators. The finite size of the orbits allows the following crude 
construction of h: 

m 

'^Oi(^)=nE[-^'-C9'^)']'' '^e^i' (2.5) 

geG 1=1 

/lo,(x)=7^(/l)(a;). (2.6) 

The definition (2.5) ensures that hoi{v) = (and consequently that hoi{v) = 0) if and 
only if w e d. In (2.6), we average hoi over all of the G-orbits in order to guarantee 
G-invariance. Note that hoi separates Oi from the rest of the orbits since, for each g €G, 
the only roots of gho^ are the points in Oi. In particular, ho-^ takes distinct values on 
Ci and O2. □ 



3. Invariant moments and determinacy of invariant 
measures 

In its ordinary formulation, the problem of moments is whether a measure exists with 
prescribed moments and. if so, whether it is unique, or determinate, within the class of all 
measures M* with finite moments [2], [3], page 388, [9], [14], pages 107-111, [26, 28, 45]. 

Several sufficient conditions for determinacy ([2], [3], pages 388-389, [9], [14], pages 
107-111) and indeterminacy ([45]) are commonly known for measures on M or R>o. For 
determinacy of measures on R"' , [9] generalizes some of those conditions and gives several 
new ones, including integral conditions. A somewhat more general picture emerges if we 
we think of A^* as the special case of A^^, with G being the trivial group of the identity 
transformation. As a non-trivial G narrows Ai'^ Si A^*, the uniqueness question can then 
be posed relative to this restricted class. In particular, we expect only a subset of all of 
the moments to be relevant for this task. 

Thus, below, we introduce G-invariant moments via G-invariant polynomials in m inde- 
terminatcs. Generators {/i, . . . , /n} of the ring of the G-invariant polynomials then allow 
us to formulate the notion of determinacy of G-invariant measures by their G-invariant 
moments. Using the main results of [9] obtained for the case of ordinary determinacy 
as a blueprint, we state several sufficient conditions for determinacy of G-invariant mea- 
sures by their G-invariant moments. These include Theorem 1, the Extended Carleman 
Theorem for G-invariant moments, and some integral conditions based on quasi- analytic 
weights. All of these results rely on a one-to-one correspondence between the invari- 
ant measures on M™ and measures on M^, Lemma 1. Established via an extension of 
the multinomial map / = (/i, . . . , /at), this injective embedding is therefore a technical 
underpinning of this work. 
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Evidently, symmetry, or invariance, has already been studied in connection with the 
problem of moments. Thus, for instance, [28] studies the existence and uniqueness of 
symmetric measures on M with given moments. Also, [9] generalizes this case and studies 
determinacy of multivariate measures supported in the positive cone ( "C-determinacy" ) . 
In one dimension, the correspondence between symmetric measures and measures on 
the non-negative half-line is obvious and well known [14], pages 107-111. Apparently, 
this correspondence easily generalizes to the multivariate setting (proof of Theorem 5.1 
of [9] and Example 1 of this work), also illustrating significance of the injection of the 
G-invariant measures on R™ into the measures on R^ (Lemma 1). 

The invariance with respect to the continuous group of rotations on R™ is discussed, 
for example, in [2]. In this case, all of the invariant functions are "generated" by a single 
invariant polynomial X]fc=i ^i which is a maximal invariant in the language of equivari- 
ance theory [35, 42]. Recall, however, that we focus on finite subgroups of GL(m,R) and 
are concerned with individual measures, not entire parametric families, being fixed by 
groups of transformations. 

Definition 5. Given generators /i, /2, . . . , /n, we call Ep/" = J^ f" dP{x) the mixed 
G-invariant moment of order a and denote it by Sq,(P). 

Let us also denote by s{P) the set of all such moments {sa{P))aeN" for ^ given 
measure P and generators /i, /2, . . . , /at. When P is clear from the context, we overload 
the notation by writing s„(fc) for Ep/^, k eN, l<n<N. 

Next, we formalize the following intuitive fact. 

Proposition 4. Let /i, . . . , /at be a generating set. Then, 

M^^{Pe A^'^ :Ep|/"| < oo a e N^}. 

Proof. The inclusion of A1^ in the right-hand side is obvious. To show the other inclu- 
sion, we take a* € N^ arbitrary and P € RHS and otherwise arbitrary. Let Sj, be the 
set of all fc-subsets of {1, ... , m} and note that 

cp|x"|= >; / |x"|dp< >; / TT^f-dP 



x-'\= j2 [ \x'^'\dp< j2 [ n 
^ E / n-r=dp= E / ^-r=d7^*p= E / ^(n 

(TSSfc creSfc ae^k 

which is finite. In the above we used the fact that TZ* and TZ are adjoint (Section 2). 
The conclusion follows from the fact that "^(IliGCT ^^" ) ^^ G-invariant and is hence a 
polynomial in /-generators: J2a '^af", but Ep/" < Ep|/"| < oo for all a £ N^. D 
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Definition 6. Let P 6 A4^ have s{P), its G -invariant moments, relative to some mini- 
mal generating set. P is then said to he G -determinate by s{P), or simply G -determinate, 
if no other measure in M^ has the same set of moments s{P) relative to the chosen gen- 
erating set. 

Since this is a key definition, we prove its correctness, that is, its independence of the 
choice of generators. 

Proof. Let /i, . . . , /jv and /ii, . . . , /li be two distinct minimal sets of generators and let 
Sf{P) and Sh{P) be the corresponding sets of G- invariant moments. Suppose that P is 
the only measure in A^^ possessing Sf{P) and suppose that there exists Q G Aif such 
that Q^P and ShiP) = Sh{Q). There must then exist a G N^ such that Ep/" ^ Eg/". 
Since /" is G-invariant, it can be written as a polynomial in /i-generators: '^pO-ph^, but 
for each monomial, we then have Kph^ = Eqh^ . This contradicts Ep/" 7^ Eqf". D 

We next give a generalized version of the extended Carleman theorem [9] . 

Theorem 1 (Extended Carleman theorem for G-invariant measures). Let 

/i,...,/jv be some minimal set of generators. Let P £ A4f and assume that for each 
n~\, . . . ,N , {s„(fc)}^j^ satisfies Carleman's condition 

E-7JmT7^ = °°- (3.1) 



k=l 



s„(2A:)V2fe 



P is then determinate by G-invariant moments. Also, C[a;i,a;2, . . . ,Xm\^ anrfSpanc{e'^'^'-''^|A G 
S} are dense in Lp{W, P), the G-invariant subspace of complex Lp{W, P), for 1 < p < 00 
and for every S gM.-^ which is somewhere dense (i.e., S, the closure of S, has a non- 
empty interior). 

Proof. The proof of the first statement takes two steps. First, note that the map / = 
(/i, . . . , /at) : W — > R^ as in Proposition 3 induces an injection / of M'^ to M^, the set of 
probability measures on M.^ with finite mixed absolute moments (E|X"| < cxa Va G N^) 
via/(P) = Po/-i. 

Lemma 1. The map f:A4^^Ai is one-to-one. 

Proof. Let P, Q G M'^ be distinct and let B G B{n) be such that P{B) > Q{B). Now, 
define h{x) = TZ{1b{x)), the G-symmetrized indicator function of B. Next, note that 
P{B) =W,pIb{X) =Ep/i(X), where the random vector X is distributed according to P, 
and the second equality is a consequence of G-invariance of P. Also, note that, similarly, 
Q{B) = Eq/i(X) and therefore Ep/i(X) > Eq/i(X). 

Observe that the level sets h~^{x > c) for any c G M are also G-invariant: 

gh^^{x > c) = {gw.w G W h{w) >c} = {w' -.g^^w' G Wh{g^^w') > c} 
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= {w' : g-^w' e W gh{w') > c} = {w' : g-^w' e W h{w') > c} 
= {w' : w' e Wh{w') > c} = h-^{x > c). 

Now, ¥,ph{X) = '^c£ih(w)-wew\ ^(^(^) — '^)^ where the summation has a finite number 
of terms due to the special form of h. Hence, there must be at least one term such that 
P{h{X) > c) > Q{h{X) > c), which gives us a G-invariant set A = h~^{x > c) (that is 
obviously also Borel) on which P and Q differ. 

It now remains to prove that f{P) ^ f{Q)- To this end, we show that 



fiPKfA) = Pif-'fA) ^^Pif-'f U O 

\ OCA / 

= p{\] r'Ko)\ '='p( U oY=^p{A). 



(*) 

OCA 

^*= P\ \ \ C^ ****'' 



In (*) and (* * *), the fact that A =UocA ^ ^^ used and (**) follows from Proposition 3. 
Summarizing the above, we obtain f{P){fA) > f{Q){fA). D 

Second, suppose that P, Q ^ -M-f, P i^Q and s{P) = s{Q), and the condition (3.1) of 
Theorem 1 is satisfied. By Lemma 1, f{P) ^ f{Q) and by definition, the latter measures 
have all their mixed (ordinary A*"-dimensional) moments identical and satisfying the con- 
ditions of the extended Carleman theorem ([9]). Thus, according to that theorem, /(P) 
is determinate, that is, f{P) ~ /(Q), which contradicts our previous observation. 

The proof of the denseness results closely parallels that of Theorem 2.3 of [9]. Let 
1 < p < oo be fixed and let h e L'^{W, P), where 1/q + 1/p = 1, and such that 

r{x)h{x)dP{x)^0 (3.2) 

w 

Vr G C[xi,a;2, . . . ,Xm]'~^ ■ In order to prove that h — P-a.s., we first note that due to G- 
invariance of h combined with Proposition 3, there exists h : M^ -^ C such that h = h{f). 
Next, following [9], we perform the Fourier-like transform 

4(A) = / e^(^'/(-))/i(a;)dP(x)= / e»(^-^)/i(y)d[/(P)](y), (3.3) 

resulting in a smooth function on R^. All derivatives of this function vanish at G R^ 
since (3.2) implies that 

' y"/i(2/)d[/(P)](y)=0 VaeN^. 

From this point, the corresponding part of the proof in [9] applies to conclude that 
under the hypotheses of the present theorem and based on Theorem 2.1 of [9], ^(A) is 
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identically 0. This, in turn, implies that h = /(P)-a.s., which finally implies that h = 
P-a.s. 

The denseness of Spanf.{e'^^'-'^^|A £ S} can be proven by a similar chain of arguments, 
replacing A in the right-hand side of (3.3) by A + a, where a £ Interior(S^). D 

Example 1 continued. Let A^*-^ be the set of positive Borel measures with supports 
in C = f{W) = M|'o and let M^ = M* r\ M^ . Lemma 1 then applies {N = m) to show 
that M'^ ^ M^ and M^ ^ A^f as sets and that f{M^) = M^ and f{M^) = M^ . 

3.1. Integral criteria for G-invariant determinacy 

In [9] , it is argued that integral criteria for determinacy arc more convenient in practice 
than series conditions such as Carleman's conditions and the notion of quasi- analytic 
weights is introduced in order to formulate suitable integral conditions. Thus, following 
[9] we introduce the following. 

Definition 7. A quasi- analytic weight on W is a hounded non-negative function w : 
W^TS. such that 

oo _. 

y^ Tjk ^°° 

fc=i \\ivj,x)''w{x)m 

for j ~ I, . . . ,m and Wi, . . . , Vj^, some basis for W . 

The following are simple generalizations of Theorems 4.1 and 4.2 of [9] that provide 
sufficient integral conditions for determinacy by invariant moments. We omit proofs of 
these results since they are straightforward analogs of their prototypes in [9] and are 
based on the same "change of variable" argument that we used to prove Theorem 1. 

Theorem 2. Let P e M^ he such that 

w{f{x)y^AP <oo 
w 

for some measurable quasi- analytic weight on R^. P is then determinate by its G- 
invariant moments. Furthermore, C[a;i,X2, . . . ,Xm]^ and Spauji-je'^''''^' |A 6 S} are dense 
in (complex) L^{W, P) for 1 <p < oo and for every S C K^ which is somewhere dense. 

Following [9], we point out that due to the rapidly decreasing behavior of w, the 
assumption of the theorem implies that P is necessarily in A^^. 

Theorem 3. For j = 1, . . . , N , let Rj > and let a non- decreasing function pj : (Rj, oo) — » 
K>o of class C^ be such that 



PMds^oo. 



Rj 
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Define hj : I 




for \x\ > Rj, 
for \x\ < Rj. 

Let A be an affine automorphism o/R . If P E M is such that 



N 



'[[h,{{Af{x)),)dP{x)<^, 



then P is determinate by its G -invariant moments. Also, C[a:i, 3:2, • . • jXm]*^ o-nd 
Span^je^'^'-^'jA G 5} are dense in (complex) Lp{W,P) for 1 < p < 00 and for every 
S G M which is somewhere dense. 

Other integral criteria discussed in [9] also have their G-invariant formulations similar 
to the ones above. Thus, for example, Theorem 4.3 of [9] provides a significantly weakened 
version of the following classical condition for determinacy: 

exp(||a;||)dP(x) < 00. 
w 

Both the classical condition and its weakened versions due to [9] easily incorporate G- 
invariance via ||.t|| i-^ ||/(a;)||. 

4. Sequential G-invariant modeling 

Hereafter, we specialize our discussion to probability measure V. The following result lays 
a foundation for modeling invariant distributions via (invariant) moment constraints and 
is an extension of [3], Theorem 30.2, page 390, for ordinary moments. 

Theorem 4. Let a sequence of G-invariant probability measures {Pi}'^i C V be such 
that 

VaGN^ limEpJ"==s„. (4.1) 

/ — ^00 

Assume that there can exist at most one G-invariant P with .such Sq. Then, such P 
indeed exists and Pi ^ P. 

Note that such P would necessarily be in Aif. 

Proof of Theorem 4. Clearly ([14], page 90), (4.1) implies that the m families of 
marginals of P;'s are individually tight which immediately implies that the family {Pi}'j^i 
is itself tight and therefore ([3], page 380) contains a weakly convergent subsequence. Since 
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every subsequential limit must also be G- invariant and have the same moments Sa, all 
such limits must be equal to each other by the uniqueness hypothesis of the theorem. We 
take P to be the common value of those limits and complete the proof by invoking the 
well-known fact ([3], page 381) that a tight sequence whose (weak) subsequential limits 
arc all equal converges weakly to that common measure. D 

One natural way to construct such sequences, whether based on theoretical or empirical 
data, is via the principle of maximum entropy (ME) [5] . To further illustrate applicability 
of invariant polynomials to probability and statistics, a choice of framework needs to be 
made. The ME principle can be derived axiomatically [8, 43]. Here, the ME framework 
is also chosen for naturally linking the abstraction of the problem of moments with the 
concrcteness of the log-linear [33], or toric [18, 39], statistical models which we use in 
our main example (Section 5). Mostly due to its connection with information theory, the 
ME approach has also been popular in image analysis and computer vision, from where 
our main example originates. It is certainly conceivable that availability of G-invariant 
generators can be useful in other statistical frameworks, for example, projection pursuit 
regression [21], pages 347-350, or linear models with G-invariant predictors. 

After introducing the ME problem in some generality in Section 4.1, we specialize it 
in Section 4.3 to f2 finite as needed for our example in Section 5. 

4.1. G-invariant maximum entropy modeling 

Let a probability measure P be absolutely continuous with respect to some positive a- 
finitc reference measure /i, P <^ ji, and let p be a density dP/ d^. Assume that sets J7 of 
interest are always contained in the support of jjl. Let H^{P) ~ — j^p{x)\ogp(x)diJL(x) 
be the entropy of P relative to /i (for P discrete, a natural choice for /i is the counting 
measure on il, the support of P: H{P) = — '^qp{x) logp{x), the Shannon entropy; for P 
continuous, a natural choice is the Lebesgue measure on fl: H{P) = — J„p(.t) logp{x) dx). 
In the absence of ambiguity, we suppress the subscript. The KuUback-Leibler distance, 
or /-divergence, of probability measure P from probability measure Q, is given by 
D{P\\Q) ~ J^^p{x)log{p{x)/q{x))dfi{x), where p and q are densities of P and Q, re- 
spectively, relative to fi. 

The following inequalities are both useful for our discussion below and complement 
the standard "data reduction" identities of information theory [48] . 

Proposition 5. Let P have density p relative to fi. Then, 

H{P) < H{n*{P)) < H{P) + log |G|. 
The first inequality becomes equality if and only if P is G-invariant or H{P) = oo. 

Proof. First, ii H{P) = oo, then the inequalities trivially become equalities. Assume that 
H{P) < oo. To see the first inequality, recall that D{P\\Q) > with the strict equality if 
and only if P = g. Then, notice that < D{P\\n*{P)) = -H{P) + Ep\og[l/n{p{X))) 
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and Eplog(l/7^(p)(X)) =E7^.(p)log(l/7e(p)(X)) = iJ(7e*(P)). Finally, noticing that 
|0|<|G'|, for all OeSw gives 

DiP\\n*{P))< fpi.)log "^""^^'f .^,^j , dMx) 
Jn T^^^ye[x]P{y)/\[x]\ 

p{x)log\[x]\df,ix)<log\G\. 

Summarizing the above, H{n*{P)) = H{P) + D{P\\n*{P)) < H{P) +log\G\. D 

Let ^ be a finite set of (measurable) real-valued functions on (G-invariant) fi, and 
{j^0 G Mj^ejp- Let 

P^^ = arg max H(P'), (4.2) 

a maximum entropy distribution relative to the above constraints. When it exists, the 
ME distribution is unique (due to convexity of the constraints and concavity of the 
entropy functional) and of exponential form (4.3) [7] (for clarification, see Remarks 3 
below). Since we are going to work with (invariant) moment constraints of the form 
Ep'/" = Ep/", a G A C N^, where P is some fixed measure, we will be writing Pa for 
the maximum entropy distribution. (In the context of entropy maximization, "moment 
functions" are sometimes interpreted broadly as essentially any functions [25].) 

Theorem 5. Let P be a probability measure on W supported on G-invariant Q and 
having a density relative to some fi. Assume that TZ*{P) is G- determinate. (Note that G- 
invariance of fl implies thatTZ*{P) is also a probability measure onfl.) Let fi, . . . , f^ be 
a minimal generating set for ^[xi, X2, ■ ■ ■ , 2;™]*^- Let Ai C A2 C ■ ■ ■ be such that IJ^j^ Ai = 
N^ and such that the corresponding maximum entropy problems (4.2) with Vfa —Kpf", 

a E Ai, have solutions Pi = Pa, • Then Pi => TZ*{P). 

Proof. First, note the key fact that for any (measurable) G-invariant function (p, Kp(p = 
Ep7?.(0) = E7J. (p) (/), implying that if P' satisfies the constraints, then so does TZ*{P'). 
Thus, if P; exists, then it is necessarily G-invariant. Indeed, suppose that it were not, that 
is, n*{Pi)^Pi. Then H{n*{Pi)) > H{Pi) (Proposition 5), contradicting maximality of 
H{Pi). That Pi is G-invariant can also be seen directly from the exponential form (4.3) 
of pi{x), the density of the maximum entropy distribution, which is self-evidently G- 
invariant: 

Pi(x) = exp( V A„r(a;)-V(A)), (4.3) 



exp( ^ A„r(a;)-V(A)] 
^(A)=log /exp( V A„r(x))dA.( 
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A=(A„,,...,A„,^J:Ep,,r=Epr, aeAi. (4.4) 

Finally, Theorem 4 is applied to complete the proof. D 

Remark 2. 1. In general, the existence of the maximum entropy problem cannot be 
taken for granted. In addition to the detailed and extensive classical treatment of the 
problem by [7] , various generalized conditions for the existence of ME distributions under 
constraints more general than our moment constraints continue to be studied in the 
hterature [25], but are largely outside the scope of this discussion, with the following 
exceptions of fi compact and, in particular, 17 finite. 

2. If O is compact, then, first of all, determinacy is no longer an issue due to the uniform 
approximation of compactly supported continuous functions by polynomials. Thus, the 
uniqueness and G-determinacy hypotheses in Theorems 4 and 5, respectively, can be 
removed (provided that the {Pi}'^i in the statement of Theorem 4 are all supported on 
the same J7). Next, based on [7], Theorem 2.1, H^{P') > — cx) for at least one feasible 
probability measure P' in (4.2) implies that all subsets A g N''^ give rise to well-posed 
ME problems due to boundedness of polynomials on compact fi ([7], page 154). Hence, 
the respective existence hypothesis on Pi in Theorem 5 can also be removed in this case. 

3. If n is finite, then the provisions for H{P') > — cx) are redundant as the entropy is 
non-negative in this case. Thus, the ME problem in the form, (4.2) for n finite is well 
posed for all A G N and all probability measure P. 

Some more remarks regarding the exponential form (4.3) and the parameters A are in 
order. We again restrict ourselves to il compact or finite. 

Remark 3. 1. Assume that the ME solution exists. If the support of P is the whole of 
17, then the exponential form (4.3) of the ME distribution is valid as it is. Otherwise, as 
follows from [7], Theorem 3.1, pi in (4.3) would need to be premultiplied by the indicator 
of ri \ A/" if ^i{^J') > O, where A/" C fi is such that P'{Af) = for all feasible measures P' 
with H[P') > — CX). Also, note that G-invariance of the constraining functions implies 
G-invariance of M and Vt \ J\f. 

2. The above special case of fi{Af) > is perhaps best understood when fi is finite as 
Af is then rather explicit. Namely, if / constraining functions (including the normalizing 
one) are arranged in an / x \n\ matrix Vh, then J\f is the intersection of Z{P), the set of 
zeros of P, and Z'(ker(Vsi)), the set of zeros of all of the vectors in the kernel of Vh- This 
special case of ME solution occurring on the boundary of the probability simplex turns 
out to be immaterial for our experiments in Section 5.6. 

3. Uniqueness of ME solution apparently does not immediately imply uniqueness of 
A*'s unless the constraining functions are /x-a.e. linearly independent on f2 \ A/" (with TV 
being commonly empty). Since our constraints are polynomial. A* 's are clearly always 
unique if ft (rt\ N , to be precise) is infinite. 

4. fl is finite in our models in Section 5, but linear independence of the constraining 
polynomials is always ensured (Section 4.3). 
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4.2. A greedy lookahead version 

We next add to Theorem 5 a greedy lookahead feature foUowing [10, 52]. After its original 
application to texture modeling [51, 52], we called this strategy "minimax learning" in 
[31], an earlier preprint of this work (and, initially, in [30]). "Adaptive minimax learn- 
ing" further emphasizes distinction from the basic stepwise construction. Thus, in [51, 52] , 
"minimax learning" of an unknown distribution P refers to an incremental model con- 
struction, also similar to [1], in which, at each step I, the entropy maximization problem is 
solved with one new constraint added at a time. The fth constraint is chosen from a suit- 
able set of functions, in our case G-invariant and/or ordinary polynomials, to minimize 
the KuUback-Leibler distance of the candidate ME distribution to the target distribution 
P (equivalently, to minimize the entropy of the candidate maximum entropy distribu- 
tions). It is both clear intuitively and has been verified in practice [30], including in our 
example Section 5.7, that greedy selection of constraints accelerates the approximation 
process. For completeness of this exposition, we present the general case first, followed 
in Section 4.3 by the refined algorithm for the finite case. 
Next, we need to order our constraints. 

Definition 8. A total well- ordering -< ofN^ (and equivalently on {/"lagNN^ such that 
a, /3, 7 G N"'^ and a ~< (3 imply a -\- "j < (3 -\- ^ is called a monomial ordering [6]. 

For a e N^ and for non-empty A C N^ , define also 

d^{a,(3) = |{7 eN^:min^(Q;,/3) -< 7 ^ max^ (a, /?)}[, 
d^{a,A) = d^{A,a) = min(i^(a,/3), 

discrete distances relative to -<, and for d g N, define discrete d- "shells" around A as 
B^{A,d) = {a G N^:(i^(A, a) < d}. The following corollary specializes Theorem 5 by 
proposing a particular choice of Ai . 

Corollary 1. Consider the hypotheses of Theorem 5. Fix a monomial ordering -< and a 
positive integer parameter r and Ze< = (0, . . . , 0) G N^. Define Pa, in accordance with 
the following scheme: 

Ai = {al}, where al = Bxg min D(P||Pr„|); 

aeB^({0},r) 

Ai = Ai^i U {a,*} /or / = 2, 3, . . . , where al = arg min -D(^||^A,_iu{a})- 

Then, Pi^n*{P). 

Remark 4- 1- Note that the minima of D always exist since D is minimized over a finite 
set. Potential ties in the minimization can in principle be broken arbitrarily, and in our 
computations, minimum under -< is used for technical convenience. 
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2. li P^ n*{P), then D{P\\Q) need not in general equal D{n*{P)\\Q), even if Q = 
7V{Q). However, there is no need to replace the target distribution P by its symmetrized 
version thanks to D{P\\Pa) =D{P\\n*{P)) + D{n*{P)\\PA), which is easy to verify. 
Hence, minimizing £'(-P||i^Ai_iU{a}) is equivalent to minimizing D{TZ* {P)\\PA^_^ui^a}) ■ 

3. At each step / = 1, 2, . . . , the procedure "explores" up to r -(-next candidate dimen- 
sions. A dimension that promises the fastest approach toward TV [P) (or, equivalently, P) 
is chosen and the current model is augmented accordingly. Note that when fi is infinite, 
each new dimension is linearly independent of Span{/" : a £ A;}, the span of the current 
model terms (Remarks 3 above). For Q. finite, it can hypothetically happen that none of 
the proposed r dimensions is actually new. This situation is prevented in Section 4.3. 

4. Let Di = D{P\\Pi) and Hi = H{Pi), for / = 1, 2, .... It can be easily seen that {A} 
and {Hi} are strictly decreasing (provided at least one linearly independent term is 
considered at each step). Clearly, if a / ^/, then Di = D{P\\PA,u{a})j that is, adding 
a linearly dependent vector does not change the model and is therefore avoided by the 
minimization phase of the procedure. 

4.3. Adaptive minimax learning of symmetric distributions on 
finite O 

Let ft be finite and let us identify /" with the A'-dimensional vector (/"(wi),..., 
/"{ujk)) <= (K.^^)*^ relative to some enumeration fc(-) of SI. Note the following result. 



Proposition 6. Let M — \Sn\- There exist ai,...,aM S N''^ such that {/"'"j^lx is a 
basis for (R^^)*^. Furthermore, ai can always be taken to be 0. 

Corollary 2. For any a G N™, let x" = x" x^ ■ ■ -xfn be identified with K -vectors 
(x"(a;i),...,a;"(wK)) GR^. There then exist ai,a2, ■■■ ,aK Gfi"^ such that {x"''}^^^ is 
a basis for R^ . 

Proof. The corollary follows immediately from the proposition by taking G to be the 
trivial group, in which case xi, X2, ■ ■ ■ ,Xm trivially comprise a minimal set of generators 
of ]R[a;i,X2,.. ■,x,n]'^. 

The proposition simply states that (M^)*^ has a basis in terms of the G- invariant 
polynomials. One such basis, for example, is given by {IoIogSsj, the set of all orbit 
indicators computed, for example, as follows: 

"^^("^ = MO) ' 

(4.5) 

N N 

ho{x)= n Y.^f^{x)-W)]^ ha{0)= n Y.'^UO)-h{0')f 

O'eSn i=l O'eSn i=l 
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and f{[w]) = f{w) Ww G fi is well defined (with [w] € Sq, Proposition 3). Since ho{x) G 
R[a;i,a;2, ■ • ■ ,Xm]'~^ and M < oo, the set of all /"(a;)'s participating in the above poly- 
nomial expansions of ha is finite. Evidently, the corresponding set of iiT-dimensional 
vectors /" spans (R^^)'-^ and therefore contains a desired basis with M elements. Clearly, 
by replacing any (say, the first) of the orbit indicators by the constant vector /° = 1, we 
obtain another basis. D 

We need to index "horizons" of our lookahead searches, hence we require more notation. 
With (3 G N^, P ^ A refers to {/"}au{/3} being linearly independent. 

Definition 9. Let A C N^ be non-empty, d,r gN and -< be a monomial order. De- 
fine B^{A,d) = {a G B^{A,d) :a _L A} and for any A and r such that 1 <r < M — 
dim(Span{/":ae A}), define d^.r = min{rf' G N : |B^(A, d')| >r}. 

Thus, dA.r is the depth of the thinnest shell around A that includes at least r mono- 
mial vectors /^ each of which is linearly independent of {/"}a- Since = B:^{A,0) C 
B^{A, 1) C • • •, it follows from Proposition 6 that dA,r is well defined. Thus, B'^{A, dA.r) 
contains at least r candidate indices, each of which gives rise to a linearly independent 
expansion of {/"}a- For the ensuing discussion, let us make the (dependent) intercept 
parameter Aq = —ip{X) explicit. As before. Pa is the unique solution to the maximum 
entropy problem (4.2) with constraints Ep^/" = Ep/",Q; G ^ C N^, now explicitly in- 
cluding the normalization constraint with a ~ 0. 

Proposition 7. Let P be a .strictly positive probability distribution on a finite G- 
invariant set fl C M™. Let /i, . . . , f^ be a minimal generating set for M[a;i,a;2, . . . , Xm]^ ■ 
The algorithm below then halts for some I* < M — 1 and Pi* — TZ*{P). 
Adaptive minimax construction of G-invariant models 
Ao := {0}; I := 0,^ 
while D{P\\Pi)>0 

l:=l + l; 

";* := argaGB^(A,_i,d^,_^.,) min^ D{P\\PA,_,u{a}); A ■= ^;-i U {aj}; 
end while 

Proof. Clearly, halting of the algorithm is determined by the membership of ln{p^) = 
\n{TZ{p)), the log-probability mass function of P*^, in Span{/" : a G Ai*}. Evidently, 
Proposition 6 guarantees that this occurs for I* < M — 1. D 

Remark 5. 1. Suppose that P is an empirical distribution based on an i.i.d. sample 
from a member of the family (4.3). It can then be verified ([27], Section 5.6) that the 
same A that defines the ME distribution Pi also maximizes the likelihood function in the 
family (4.3). Similarly, TZ*{P) maximizes the likelihood within V^ . 

2. The condition of strict positivity of P can certainly be relaxed in view of the first 
two of Remarks 3. Indeed, suppose that P has zeros. The above procedure could be 
applied with O immediately reduced to fi^, the union of orbits of P > 0. M is then to 
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be understood as the number of such orbits. Alternatively, one can apply the original 
algorithm with full fl until A/" 7^ is actually encountered. Only then would fl be reduced 
(by A/"). Note that currently independent {/"}a, might become dependent. Replacing, 
in that case, {/"}yi, by any basis (perhaps chosen in a systematic way), the algorithm 
would continue on adding another constraint, and so on (possibly reducing fl by another 
J\f), until the current number of constraints 1 + /* = jfJ \ 7V| < M, or as soon as the 
reduced vector oiln{p'-^) G Span{/" -.a G A;.}. Note that, eventually, Q again reduces to 
0+. 



5. Example 

5.1. Microimage distributions 

We consider an example from the area of natural image statistics which, in its broad for- 
mulation, studies various statistics defined on digital (or digitized) images of sufficiently 
complex scenes and environments. For example, photographs of a natural landscape or 
an urban scene are complex, as opposed to a photograph of an artificially arranged scene 
with an isolated chair in an otherwise empty room. 

Let X = C^^ represent the space oi h x w digital images with L intensities per pixel; 
thus, Cl = {0, 1, . . . , i — 1}. We will be interested in statistical regularities of the popula- 
tions of n X n microimages, very small (n <C ft., w) subimages of images i EX. We denote 
the set C£ of all n x n microimages by ri„ . 

To each image i G I, wc associate a count vector 

n(i) == (ni(i),n2(i), . . . , nK{i)), (5.1) 

where nk{i), i ^ k < K, is the within-image frequency of the fcth matrix from (7^ 
under some fixed enumeration of fi {K = L" ). We further assume that we observe 
/i, /2, . . . , iNirrt I i.i.d. random images from a hypothetical natural image distribution ^ on 
T, and that ^ is such that the independent count vectors n(/i), n(/2), . . . , n(/jVi,„) follow a 
multinomial distribution parameterized by unknown microimage probability distribution 
vector {pi,p2, ■ ■ ■ ,Pk) and N = [h — n-\-l)[w — n + 1), the total number of n x n (overlap- 
ping) subimages in each image. Alternatively, the microimage distribution of image i can 
be defined as the relative frequencies p{Lo\i) = ni.i^^\{i)/N. The resulting assumption that 
^{Ij)^ J = li 2, . . . , Nirm constitute an i.i.d. sample from a single multinomial distribution 
can certainly be debated. 

Typically, natural image statistics arc studied on large collections of digital gray scale 
images of a particular origin (e.g. optical or range imaging) and a particular domain 
(e.g., landscapes, terrains). Particular imaging domains, such as urban scenes and natu- 
ral landscapes, or even the totality of all visual experiences of the human eye ([30, 40]), 
are commonly described by a single probability distribution, such as ^, on I. Certainly, 
the microimage distributions can vary with the origin and domain of the imagery, in 
short, with ^. Remarkably ([30]), certain properties of microimage samples (of optical 
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imagery) do appear stable, regardless of the particular sampling scheme (of images as 
well as microimages within image) and the imaging domain. Wc can thus think of the 
"universal" microimage distribution p. We leave aside many other issues associated with 
generic image models [37]. To execute a specific task (e.g., object detection [16, 50] or 
segmentation [23]) efficiently and accurately, computer vision applications routinely op- 
erate with estimates of p, or, possibly, p{oj\A) with additional conditioning on a relevant 
semantic attribute A. Whereas some applications use discrete counting and do it directly 
on the image intensity spaces X, others operate on spaces of derived (multi-) filter re- 
sponses which naturally call for continuous densities. Largely independent of such details, 
an adequately incorporated estimate of p can, in particular, increase the robustness of 
the application to various departures of the unseen image from the training ones. For a 
simple example, note that p gives rise to useful models for generic background, or clutter, 
in the natural images. Thus, Figure 1 depicts a realization of the maximum entropy ran- 
dom field constrained to have all of its 2 x 2 marginals follow a G-invariant (Section 5.3) 
estimate of p (obtained in [30] ) . These are some of the motivations for work on local 
statistics of natural images [19, 24, 30, 32, 34, 41]. 

5.2. Data 

The popular van Hateren collection consists of 4167 1024 x 1536 two bytes/pixel {L = 2^^) 
raw images of natural and urban landscapes obtained with a Kodak DCS42G camera, 
"linearized with the lookup table generated by the camera for each image" [22]. This 
linear version, as opposed to the also widely used PSF-correctcd ( "deblurrcd" ) one, is 
used below. Also, 49 irregular images (42 of which appear extremely blurred, with the 
other seven being incorrectly oriented) have been excluded, resulting in the image sample 
size of Nim ~ 4118. After minimal preprocessing (inward adjustment of the 0.5% extremes 
of the pixel intensity histogram within each image) , the image intensities have been log- 
transformed {ixy ^^ \og(l + ixy), [34, 41]) for perceptual enhancement. In order to expedite 
our exposition, we quantize (uniformly) the dynamic range of each log-transformed image 
to L = 4 levels only. See Figure 2. 



5.3. The group G of microimage symmetries 

Based on similar data, the natural microimage distribution p (Section 5.1) was ob- 
served in [30] to be nearly invariant to the full group G of symmetries of 51^ (Fig- 
ure 1, bottom), fi^ is here identified with the square-based cuboid whose bases cor- 
respond to the "all-dark" (0) and "all-bright" {L ~ 1) configurations. Evidence of the 
G-invariance has included visual inspection of graphs of various multidimensional local 
statistics [24], point estimates of probabilities of high contrast patches [19, 34] and P- 
values of statistical tests [30]. Invariance with respect certain subgroups of G, such as 
the "left-right" and "up-down" symmetry transformations, is more pronounced than, for 
example, invariance with respect to the intensity inversion tj i— s- (i — 1)1 — lo, where I is 
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Figure 1. Top: a synthetic image witli L = 4 grey levels is obtained as a realization of the 
maximum entropy field under the constraints that all of its 2 x 2 marginals follow a G-invariant 
estimate of the natural microimage distribution p (courtesy of Professor L. Younes, The Johns 
Hopkins University); bottom: the elements oj G fjf are displayed rowwise (top-down, left to 
right) in the descending order of their frequencies p{co). G-invariance is more pronounced for 
the principal masses (top). 16 oj's have p{u}) — and 2 G-orbits have p{0) — (the last ten uj). 
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Figure 2. A natural image from van Hateren's collection [22] with L = 256 (top) and L = 4 
(bottom) gray levels (log intensities are necessitated by the high resolution of the original im- 
ages). 



the matrix of all ones. Nonetheless, we here consider the entire group G of the corre- 
sponding transformations and one can easily specialize the discussion to the subgroups 
of G. 
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We define G via its three generators, gr, Qs and gt. Let gr represent the counterclockwise 
rotation of the square by 7t/2 and let gs stand for the reflection of the square through its 
secondary diagonal. The resulting subgroup of G is isomorphic to Dg,, the dihedral group 
of order 8, with the following presentation: {gr,gs\gt = ol — ^^drgs = 9sgr)- Recall that 
composite actions propagate right to left - for example, rsuo acts on to by the diagonal 
reflection g^ followed by the rotation gr- 

The last symmetry required to generate G is that with respect to the photometric 
inversion gicu = (L — 1)1 — uj,ujG Cl^. Finally, the group G generated by all of the above 
symmetries has presentation {gr,gs,gi\gr ^91"=" 9t = ^,9s9i == 9i9s,9r9i = 9i9r,grgs = 
gsgf)- Therefore, G = DgX C2, where C2 = {gi) is the cyclic group of order two. 

In order to simplify computations (of the matrix representation of G), we shift the 

intensity ranges Cl down by (L — l)/2, thus replacing f2^ by fi^ = {^{L — l)/2, — (£ — 
l)/2 + 1, . . . , (L - l)/2}"'. We now also fix n = 2. With the standard basis for M^, the 
matrix representation of G is generated by 

/O 1\ /I 0^ 

p|1000l pfOOOl 

^'•^^ 10 0' ^^^0010 

Vo 1 0/ Vo 1 0. 

(5.2) 

-10 
0-100 
0-10 
0-1. 

The following proposition reveals the structure of Su and gives the "complexity" , or 
"size" , of the G-invariant models, which is important for efficient computation of such 
models (e.g., via a matrix form of the Reynolds operator TZ) [31]. 

Proposition 8. Let L be even. Then, \Sq^l\ = ^ ^'^^ ^^^ "*"'^"^ . Among those, there 
are L orbits of size two, — orbits of size four, "'"'^g -lOL Qj-ijUg qJ g^^e eight and 

A proof of the proposition appears in [31], Appendix E. 

If we think of ^ on X (Section 5) as a discrete approximation of a fully continuous 
image random field model, then the invariance to {gr), the rotation subgroup of G, 
can be thought of as a manifestation of isotropy of the continuous field; (5,.) is then 
(isomorphic to) a finite subgroup of 50(2), the group of planar rotations. 

5.4. A minimal set of generators oiM\x-i,X2,x^,x^'^ 

First, let us recall that, according to (5.2) and (2.2), the G action on M.[xi,X2,xz,X4\ 
can be concisely expressed via the action of gr,9s,9i, generators of G, on xi,X2,a;4,a;4, 
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canonical generators of 'M.[xi,X2,X3,X4]: 

grXl = X2; 9rX2=X3; grX3=X4; grXA=Xi\ 

gsXi=xi; gsX2=x4,; 9^x3= X3] gsX4 = X2] (5.3) 

giXk = ~Xk, fc= 1,2,3,4. 



Theorem 6. The following set of polynomials is a minimal set of generators o/R[a;i,a;2, 

fl{x) = {Xi +X3)(X2 +X4), f2{x) ^XiX3+X2Xi, 

f3{x)^x\+xl+X/l^+x\, fi{x)^XlX2X3Xi, (5.4) 

I) 



f^{x) = {xl+xl){xl+xl) 



Also, 



.[xi,X2,X3,X4] ^ R[yi,y2,y3,y4,y5]/lF, where 

If == {he'R[yi,y2,y3,y4,y5]-Hfi,f2,f3,f4,f5) (5.5) 

= 0eR[xi,X2,X3,X4]}^ {q) 



and 



9(2/1, y2, ya, 2/4, ys) = 4yij/3 + 82/12/22/5 + 2j/i2/32/5 - ^yiyjy^ 

+ 162/i - 82/22/3 - 82/22/4 + 42/2J/5 + 2/1 - 22/32/1 + 2/4- 

A proof of the theorem is given in [31] and docs not require familiarity with algebraic 
geometry or invariant theory. The generators and this proof were first obtained by the 
same author in [30] from the first principles and then verified using Macaulay2 [20]. 

5.5. Models for p 

Thus, we model 4x4x4x4 frequency tables. We distinguish between two types of 
models, G-invariant and general, according to whether or not G-invariance is enforced. 
Let T' = {p^. > 0, l<k<K = 256, Ef=iPfe = 1} and V^ = {peV, p = n{p)} be the 
saturated and maximal G-invariant family of models, respectively. Note (Proposition 
8) that P'~^ is of "size" 30 = 71/ — 1. Assuming, for the time being, strict positivity of 
the cell counts, or, equivalently, p, the probability vector of the empirical microimage 
distribution, 

Pi^) = ir^'^PHi)^ (5-6) 
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we write these and all of the other models considered in the log-linear form below, 
conforming to the framework of Proposition 7: 

Pfe(A,A)-cxp(^A„rK)), (5.7) 

\aeA J 

where A must now contain 0. In the G-invariant case, all of the terms arc the evaluations 

of the G-invariant monomials /"(x) = /" {x)f2 i^)/-} v'")ft {^)fb (■^) ^^ *^^ 
K = 256 points of H|. Disregarding the invariance, the terms /"{ujk) are replaced by 
the evaluations of the original moments x" = x" X2 2^3 x^ on 17| (Corollary 2). 
Hence, any sequence ao = 0, ai, . . . , ai-i S N^ with dim(Span{x"*}jZQ) = / identifies a 
subfamily of the ordinary models. Likewise, any sequence ao = 0, ai, . . . , a;-i G N^ with 
dim(Span{/"'}j~Q) = I identifies a subfamily of the G-invariant models. 

We fix -< to be the graded lexicographic ordering relative to /i < /2 < /a < /4 < /s (and 
xi < X2 < X3 < Xi). That is, a -< /? if and only if |a| < |/3|, or |a| = |/3| and the rightmost 
non-zero entry in /3 — a is positive. 

Relative to -(, models (5.7) will first be constructed stepwise, by simply adding the 
"smallest" term /" that is not already in the span of the terms of the current model 
and is "larger" than those terms. For example. Table 1 lists the first 15 ordinary and first 
15 G-invariant terms under ^. Second, the greedy acceleration will be used. Finally, we 
will make ordinary and G-invariant terms compete in an automatic greedy construction. 

5.6. Parameter estimation 

Let 

^'(A)='£]^nfc(*)^A„r(a;fc)-^lnK(j)!]U^nfc^A„r(c.fc) (5.8) 

1=1 yk=l a£A k=l J fc=l aeA 

be the log-likehhood of A = (Aq„ , Aqj , . . . , Aq,_ J under a model (5.7) with A = {ao = 
0,ai, . . . ,ai-i} and given the independent microimage counts n(i), 1 < i < -/Vim, 
(5.1) and rik = '^iJ.i' ^kii)- Note that an i.i.d. multinomial sample of A*" x Ni^ 
individual microimages would produce the same data and, subsequently, the same 
likelihood (5.8). Recall, then, that the maximum likelihood (ML) estimation under 

Table 1. The first 15 lowest ordinary (top) and G-invariant (bottom) terms under -< 
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(5.7) from an i.i.d. multinomial sample is equivalent to solving for the ME distri- 
bution under the respective constraints (with the special treatment of the bound- 
ary case, Remarks 3). More general versions of this simple fact have been repeat- 
edly rediscovered in several contexts (e.g., [15, 27]) with the oldest reference known 
to the author from [39], page 14, dating back to the 1960's. Thus, in particular, 
ML estimate A* is unique in this case. We compute A* using the ME formula- 
tion 

Ep/"=Ep/", ae A, where pis as in (5.6), (5.9) 

and a standard numerical solver in Matlab instead of more dedicated procedures (e.g., 
the improved iterative scaling of [10] or the Younes stochastic gradient used in [52]). 
This works well, even with ordinary moments since K = 256 is relatively small in 
our case. Our data and constructed sets A are such that violation of the positiv- 
ity constraint is immaterial for the parameter estimation in view of Remarks 3. In- 
deed, note that if p has zeros, but there exists a strictly positive distribution p' sat- 
isfying (5.9), then M = (first of Remarks 3) and the ME problem still has (5.7) 
as its unique solution where A* is the unique solution to the constraint equations 
and is also the unique ML estimate of A. Except in the saturated and maximal G- 
invariant models, the above condition is always satisfied by the models produced in 
our constructions. In fact, we terminate our model construction before this condi- 
tion is to be violated should we add more constraints. In experiments presented in 
Section 5.7 below, we manage to construct models of high complexity (i.e., nearly 
200 parameters) before running into numerical instabilities or the boundary situation 
(AA7^0). ^ _ 

Finally, p is trivially the ME distribution subject to the normalization constraint alone. 
(Clearly, this corresponds to the ML estimate under the saturated model parameterized 
by the point masses.) Similarly, jP = TZ{p) is the ME distribution with constraints on the 
probability of G-orbits (i.e., expectation of the indicators Iq (4.5)). Equivalently, p^ is 
the ML estimate under the maximal G-invariant family 7-"^ (parameterized by the orbit 
masses). 

5.7. Results 

Figure 3 shows the results of fitting models with increasing numbers of terms constructed 
from ordinary and G-invariant monomials, with ("accelerated") and without the greedy 
lookahead. When producing ordinary terms x", we have used r = 200, but would only 
examine a random subset of 50 instead of the entire B^ (^z-i , rfyi,_i ,r) ■ With the invariant 
terms /", r = 50 with 25 randomly sampled terms. Thus, when allowing simultaneously 
both x" and /", the lookahead optimization would be over 75 mixed terms, each of which 
is outside the span of the current mixed expansion. Several reruns have not revealed 
any significant variation of the results. The effect of acceleration is more evident when 
using the ordinary terms (the top and middle curves in the bottom plot of Figure 3). 
The results clearly indicate that the G-invariant mixed moments are useful for producing 
parsimonious models. 
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Figure 3. Log-linear models with increasing numbers of ordinary and G-invariant terms are 
fit to 4x4x4x4 microimage frequency data. The constant term is always included first. 
G-invariant and ordinary models are focused on in the top and bottom plots, respectively. The 
maximal G-invariant model {P ) has 30 free parameters, but its four-parameter simplification 
with three non-constant G-invariant terms (/a, /i and /I) nearly achieves the best G-invariant 
fit. These terms, together with another three G-invariant terms (/i/3,/4,/3 ), are also included 
by the accelerated constructor in the best-fitting ten-parameter model. 
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6. Conclusion 

Clearly, model reduction due to the described type of invariancc is limited by |5si| > 
|ri|/|G| as \G\ is the maximal orbit size (31 > 256/16 in our main example). Thus, as O 
grows (e.g., as finer quantization of f2^ produces more levels L), the marginal reduction 
diminishes (|G| remains constant). However, the very ability to explore interactions di- 
rectly within a G-invariant model space, or even across several such spaces with distinct 
groups G (including the trivial one), appears to be valuable for building simple models. 
This is especially so given the recently increased availability of the suitable algebraic soft- 
ware for finding the generators. Note that G-invariant linear subspaces of (R^)^ C Mp 
can also be produced directly using the group action. Namely, both Sn and the Reynolds 
operator TZ can be easily computed from first principles. Then, in principle, any modeling 
terms g : fJ ^ R can be projected to (M^)"^ via TZ{q) . However, these computations would 
need to be repeated should another (G-invariant) fl C M™ (with the same G-action) be 
considered. The advantage of having the generators is then evident. This is especially so 
for continuous density estimation when f2 (and hence 5o) is infinite, precluding exact 
computation of TZ. 

Also, note that computations of the parameter estimates in the presence of G- 
invariance reduce appropriately when transferred properly to the factor space Sq. This 
is intuitively clear for any modeling framework and has been shown in detail for the ME 
approach in [31], Section 6, and used in the present experiments. 

Finally, it is likely that the present way of producing sets B^{Ai^i,dAi_i,r) can 
be improved, at least for some common orderings -<. Namely, in order to produce r 
terms /" outside Span({/"'}^) {\A\ < M = \Sq\ = dim((M'^^)'^)), we presently gener- 
ate a large list of candidates {a >- max^(A)) and test its members for membership 
in Span({/" }^) using standard (numerical or symbolic) linear algebra tools (i.e., the 
rank function) of Matlab. At the same time, there may be more efficient and reliable 
methods. In particular, the rapidly developing field of algebraic statistical modeling 
[12, 18, 39] might offer an approach taking advantage of the following isomorphisms. 
First, K[xi,a;2, . . . ,Xm]'^ — K[yi,y2, • ■ • ,yN]/If [6], page 339, where // is the ideal of the 
relations among the generators ./i , /2 , • ■ • , Jn (see also the discussion of Example 1 af- 
ter Proposition 3). Next, let F ■.R[yi,y2,. ■ . ,yN]/If ~*'M.[xi,X2, ■ . ■,Xm]'^ via [j/j] i-^ /», 
1 <i < N. Let (j) be some injection (e.g., division by an appropriate Grobner basis of If) 
of S.[xi,X2, . . . , Xm]'~^ into M[yi, 2/2, • ■ • , Vn]/!/ so that F o </) is the identity. Then, since O 
is finite (and G-invariant), 

{R'Y ^ R[xi, X2, . . . , x^f/I{nf = M[2/i, y2, ■ • ■ , yN]/If/HI{^f), 

where /(f^)*^ = {q <E M[a;i , X2, . . . , Xm]^ '■ q{^) ~ Vcj G il} is the ideal of G-invariant poly- 
nomials vanishing on fl (/(H)*^ = ^(^) nM[a;i,a;2, . . . ,Xm]'^, I{^) is the ideal of fi). 

Acknowledgements 

I am grateful to Professor D. Geman (The Johns Hopkins University) for introducing me 
to statistics of natural images and related statistical learning paradigms which have led to 



Symmetric measures via moments 389 

this work. I thank Professors A. van der Vaart and R. GiU, and my other former coUeagues 
at EURANDOM (the Netherlands) for helpful discussions. I am thankful to Professor 
D. Rumynin (Warwick University) for consulting me on several of the algebraic issues of 
this work. I also thank anonymous referees and the Editors for their critical remarks and 
suggestions for revision of the original manuscript. 

References 

[1] Barron, A.R. and Sheu, C.-H. (1991). Approximation of density functions by sequences of 

exponential families. Ann. Statist. 19 1347-1369. MRl 126328 
[2] Berg, C. (1996). Moment problems and polynomial approximation. Ann. Fac. Sci. Toulouse 

Math. (6) 9-32. MR1462705 
[3] Billingsley, P. (1995). Probability and Measure, 3rd ed. New York: Wiley. MR1324786 
[4] Computational Algebra Group School of Mathematics & Statistics University of Sydney 
(2003). Ttie Magma Computational Algebra System. Release Notes V2.10. Computa- 
tional Algebra Group School of Mathematics & Statistics University of Sydney. Avail- 
able at http : //magma . maths . usyd . edu . au/magma . 
[5] Cover, T.M. and Thomas, J. A. (1991). Elements of Information Ttieory. New York: Wiley. 

MR1122806 
[6] Cox, D., Little, J. and O'Shea, D. (1997). Ideals, Varieties, and Algoritfims, 2nd ed. New 

York: Springer. MR1417938 
[7] Csiszar, I. (1975). /-divergence geometry of probability distributions and minimization 

problems. Ann. Probab. 3 146-158. MR0365798 
[8] Csiszar, I. (1991). Why least squares and maximum entropy? An axiomatic approach to 

inference for linear inverse problems. Ann. Statist. 19 2032-2066. MR1135163 
[9] de Jeu, M. (2003). Determinate multidimensional measures, the extended Carleman theo- 
rem and quasi-analytic weights. Ann. Probab. 31 1205-1227. MR1988469 
[10] Delia Pietra, S., Delia Pietra, V. and Lafferty, J. (1997). Inducing features of random fields. 

IEEE Trans. PAMI 19 45-47. 
[11] Derksen, H. and Kemper, G. (2002). Computational Invariant Tfieory. Berlin: Springer. 

MR1918599 
[12] Drton, M. and Sullivant, S. (2007). Algebraic statistical models. Statist. Sinica 14 1273- 

1297. 
[13] Dummit, D.S. and Foote, R.M. (1991). Abstract Algebra. Englewood Cliffs, NJ: Prentice 

Hall Inc. MR1138725 
[14] Durrett, R. (1996). Probability: Theory and Examples, 2nd ed. Belmont, CA: Duxbury 

Press. MR1609153 
[15] Dykstra, R.L. and Lemke, J.H. (1988). Duality of / projections and maximum likelihood 
estimation for log-linear models under cone constraints. J. Amer. Statist. Assoc. 83 
546-554. MR0971385 
[16] Fleuret, F. and Geman, D. (2001). Coarse-to-fine face detection. Int. J. Comput. Vision 

41 85-107. 
[17] Fogarty, J. (2001). On Noether's bound for polynomial invariants of a finite group. Elec- 
tron. Res. Announc. Amer. Math. Soc. 7 5-7. MR1826990 
[18] Geiger, D., Meek, C. and Sturmfels, B. (2006). On the toric algebra of graphical models. 
Ann. Statist. 34 1463-1492. MR2278364 



390 A. Koloydenko 

[19] Geman, D. and Koloydenko, A. (1999). Invariant statistics and coding of natural microim- 
ages. In IEEE Workshop Statist. Comput. Theories of Vision (S. Zhu, ed.). Published 
on web at http://www.stat.ucla.edu/~sczhu/Workshops/sctv99/Gemanl.html. 

[20] Grayson, D. and Stillman, M. Macaulay 2, a software system for research in algebraic 
geometry. Available at http://www.math.uiuc.edu/Macaulay2/. 

[21] Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning. 
New York: Springer. MR1851606 

[22] Hateren, J.H.V. and Schaaf, A.V.D. (1998). Independent component filters of natural im- 
ages compared with simple cells in primary visual cortex. Proc. R. Soc. Lond. B 265 
359-366. 

[23] Heiler, M. and Schnorr, C. (2003). Natural image statistics for natural image segmentation. 
In Proc. of the 9th ICCV 1259-1266. IEEE Computer Society. 

[24] Huang, J. and Mumford, D. (1999). Statistics of natural images and models. In Proc. of 
CVPR 541-547. IEEE Computer Society. 

[25] Ishwar, P. and Moulin, P. (2005). On the existence and characterization of the maxent 
distribution under general moment inequality constraints. IEEE Trans. Inform. Theory 
51 3322-3333. MR2240198 

[26] Junk, M. (2000). Maximum entropy for reduced moment problems. Math. Models Methods 
Appl. Sci. 10 1001-1025. MR1780147 

[27] Jupp, P.E. and Mardia, K.V. (1983). A note on the maximum-entropy principle. Scand. J. 
Statist. 10 45-47. MR0711335 

[28] Keich, U. (1999). Krein's strings, the symmetric moment problem, and extending a real 
positive definite function. Comm. Pure Appl. Math. 52 1315-1334. MR1699971 

[29] Kemper, G. INVAR. A Maple package for invariant theory of finite groups. Available from 
http : //www . iwr . uni-heidelberg . de/groups/compalg/kemper/invar . html . 

[30] Koloydenko, A. (2000). Modeling natural microimage statis- 

tics. Ph.D. thesis, Univ. Massachusetts Amherst. Available at 
http: //www. maths .nottingham. ac .uk/personal/pmzaak/thesis .pdf . 

[31] Koloydenko, A. (2006). Symmetric measures via moments. Technical Report 06- 
07, School of Mathematical Sciences, Nottingham Univ., UK. Available at 
http: //www. maths .nottingham. ac .uk/personal/pmzaak/InvModels .pdf . 

[32] Koloydenko, A. and Geman, D. (2006). Ordinal coding of image microstructure. In Int. 
Conf. Image Proc, Comput. Vision, Pattern Recogn. (H.R. Arabnia, ed.) 613-620. 
CSREA Press. 

[33] Lauritzen, S.L. (1996). Graphical Models. New York: The Clarendon Press Oxford Univer- 
sity Press. MR1419991 

[34] Lee, A.B., Pedersen, K.S. and Mumford, D. (2003). The nonlinear statistics of high- 
contrast patches in natural images. Internat. Comput. Vision 54 83-103. Available 
at http : //dx . doi . org/10 . 1023/A : 1023705401078 . 

[35] Lehmann, E.L. (1997). Testing Statistical Hypotheses, 2nd ed. New York: Springer. 
MR1481711 

[36] McCuUagh, P. (2002). What is a statistical model? (with discussion). Ann. Statist. 30 
1225-1310. MR1936320 

[37] Mumford, D. and Gidas, B. (2001). Stochastic models for generic images. Quart. Appl. 
Math. 59 85-111. MR1811096 

[38] Olver, P.J. (1999). Classical Invariant Theory. Cambridge Univ. Press. MR1694364 

[39] Pachter, L. and Sturmfels, B., eds. (2005). Algebraic Statistics for Computational Biology. 
Cambridge Univ. Press. MR2205865 



Symmetric measures via moments 391 

[40; 

[41 



[42" 
[43^ 

[44 

[45; 

[46" 
[47; 

[48; 

[49; 

[5o; 

[51 



Pedersen, K. (2003). Statistics of natural image geometry. Ph.D. thesis, Dept. Computer 

Science, Univ. Copenhagen. 
Pedersen, K. and Lee, A. (2002). Toward a full probability model of edges in natural images. 

In 7th ECCV 1 (A. Heyden, G. Sparr, M. Nielsen and P. Johansen, eds.) 328-342. 

Springer. 
Schervish, M.J. (1995). Theory of Statistics. New York: Springer. MR1354146 
Shore, J.E. and Johnson, R.W. (1980). Axiomatic derivation of the principle of maximum 

entropy and the principle of minimum cross-entropy. IEEE Trans. Inform. Theory 26 

26-37. MR0560389 
Smith, L. (1995). Polynomial Invariants of Finite Groups. Wellesley, MA: A K Peters Ltd. 

MR1328644 
Stoyanov, J. (2000). Krein condition in probabilistic moment problems. Bernoulli 6 939- 

949. MR1791909 
Sturmfels, B. (1993). Algorithms in Invariant Theory. Vienna: Springer. MR1255980 
The GAP Group (2002). GAP - Groups, Algorithms, and Programming, Version 4-3. The 

GAP Group. Available at http://www.gap-system.org. 
Tops0e, F. (2001). Basic concepts, identities and inequalities - the toolkit of information 

theory. Entropy 3 162-190. MR1885051 
Viana, M. (2005). Symmetry studies. Technical Report 027, 

Eurandom, Eindhoven, The Netherlands. Available at 

http : //www . eurandom . nl/report s/2005/027mVreport . pdf . 
Viola, P. and Jones, M. (2004). Robust real-time object detection. Int. J. Comput. Vision 

57 137-154. 
Zhu, S., Lanterman, A. and Miller, M. (1998). Clutter modeling and performance and 

analysis in automatic target recognition. In Workshop on Detection Classification of 

Difficult Targets 477-496. Redstone Arsenal. 
Zhu, S., Wu, Y. and Mumford, D. (1997). Minimax entropy principle and its application 

to texture modeling. Neural Computation 9 1627-1660. 



[52 

Received August 2006 and revised September 2007 



